Quickstart#

Here, we’ll go through a very basic example of reconstructing, preprocessing, and analyzing 3D face data from video data using Medusa’s Python API. For more information about its command-line interface, check the CLI documentation!

We’ll use a short video to reconstruct, shown below:

import os  # need 'egl' for rendering!
os.environ['PYOPENGL_PLATFORM'] = 'egl'
from IPython.display import Video

from medusa.data import get_example_video
vid = get_example_video()

# Show in notebook
Video(vid, embed=True)  

For this example, we’ll use the Mediapipe Face Mesh model to reconstruct the face in the video in 3D. We are going to use the high-level videorecon function from Medusa, which reconstructs the video frame by frame and returns a MediapipeData object, which contains all reconstruction data.

from medusa.preproc import videorecon
data = videorecon(vid, recon_model_name='mediapipe', loglevel='WARNING')
2022-07-26 13:46 [INFO   ]  NumExpr defaulting to 2 threads.
2022-07-26 13:46 [INFO   ]  OpenGL_accelerate module loaded
2022-07-26 13:46 [INFO   ]  Using accelerated ArrayDatatype
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Great! Now let’s inspect the data variable. The reconstructed vertices are stored in the attribute v, a 3D numpy array of shape \(T\) (time points) \(\times\ V\) (vertices) \(\times\ 3\) (X, Y, Z):

print("`v` is of type: ", type(data.v))
print("`v` has shape: ", data.v.shape)
`v` is of type:  <class 'numpy.ndarray'>
`v` has shape:  (232, 468, 3)

The the data contained in v represents, for each time point, the 3D coordinates of the vertices (“points”) that describe the shape of the face. A nice way to visualize these vertices is as a “wireframe” on top of the original video. Each data object in Medusa has a render_video method that renders the reconstructed data as a video.

We do this below. By setting the video parameter to the path of the video, we tell the render_video method to render the wireframe on top of the original video:

f_out = './example_vid_recon.mp4'
data.render_video(f_out, wireframe=True, video=vid)

# Show in notebook
Video('./example_vid_recon.mp4', embed=True)

That looks pretty good! However, there are two problems with the data as it is now. First, each vertex represents both “global” or “rigid” movement (i.e., the face moving left/right/up/down and rotating) and “local” or “non-rigid” information (i.e., facial expressions such as smiling and frowning). Second, part of these rigid movements seem to reflect noisy “jitter”, which are simply inaccuracies in the reconstruction.

We can separate global and local movement by aligning the reconstructions across time, not unlike how motion correction is done in functional MRI preprocessing. We can use the align function from Medusa for this:

from medusa.preproc import align
data = align(data)

Let’s visualize it again. By default, Medusa aligns the data to the first time point:

f_out = './example_vid_recon.mp4'
data.render_video(f_out, wireframe=True, video=vid)

Video('./example_vid_recon.mp4', embed=True)

We can further preprocess the data by applying a temporal filter using the filter function from Medusa.

from medusa.preproc import filter
data = filter(data, low_pass=4, high_pass=0.005)

This time, let’s render it without the original video in the background (as it’s not aligned anymore, anyway) and let’s render a smooth mesh instead of a wireframe:

f_out = './example_vid_recon.mp4'
data.render_video(f_out, smooth=False, video=None)

Video('./example_vid_recon.mp4', embed=True)

There is a lot more functionality in Medusa, including different reconstruction models, additional preprocessing functions, and analysis options. A great way to explore this is to check out the tutorials!